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ABSTRACT 

While the Internet was conceived as a decentralized net- 
work, the most widely used web applications today tend 
toward centralization. Control increasingly rests with cen- 
tralized service providers who, as a consequence, have also 
amassed unprecedented amounts of data about the behav- 
iors and personalities of individuals. 

Developers, regulators, and consumer advocates have looked 
to alternative decentralized architectures as the natural re- 
sponse to threats posed by these centralized services. The 
result has been a great variety of solutions that include per- 
sonal data stores (PDS), infomediaries, Vendor Relationship 
Management (VRM) systems, and federated and distributed 
social networks. And yet, for all these efforts, decentralized 
personal data architectures have seen little adoption. 

This position paper attempts to account for these failures, 
challenging the accepted wisdom in the web community on 
the feasibility and desirability of these approaches. We start 
with a historical discussion of the development of various 
categories of decentralized personal data architectures. Then 
we survey the main ideas to illustrate the common themes 
among these efforts. We tease apart the design character- 
istics of these systems from the social values that they (are 
intended to) promote. We use this understanding to point 
out numerous drawbacks of the decentralization paradigm, 
some inherent and others incidental. We end with recom- 
mendations for designers of these systems for working to- 
wards goals that are achievable, but perhaps more limited 
in scope and ambition. 

1. BRIEF HISTORICAL OVERVIEW 

The search for alternatives to centralized aggregation of per- 
sonal data began in the late 1990s which saw a wave of 
so-called 'negotiated privacy techniques' including commer- 
cial 'infomediaries' |24| 116] . These entities would store con- 
sumers' data and help facilitate the drafting of contracts 
that set the terms of the exchange and use of data. The 
1999 book Net Worth [33] galvanized both industry and pri- 
vacy advocates, generating hopes for a future in which pri- 
vacy problems could be solved through a mix of decentral- 
ized storage and private contracts, potentially obviating the 
need for privacy law or even the adoption of fair information 
practices [lOl [60] . 



Within five years, nearly all of this excitement had faded and 
all commercial (Persona, Privada, Lumeria, etc.) and com- 
munity (P3P) initiatives had floundered [T] — some in truly 
spectacular fashion, such as AllAdvantage. And yet, by the 
end of the decade, many new initiatives and projects that 
shared almost identical goals emerged. Vendor Relation- 
ship Management (VRM) [35] has gained steady momen- 
tum as a general set of principles that aim simultaneously 
to improve user privacy, enhance customer autonomy, and 
increase market efficiency through a combination of mecha- 
nisms that aggregate data in a single (per-user) repository 
under users' control and tools to negotiate agreements that 
would grant outside organizations access to and use of that 
data. 

Parallel efforts to develop so-called personal data stores (PDS), 
personal data servers, personal data lockers/vaults, and per- 
sonal clouds [TS] have focused more narrowly on the plat- 
forms and protocols to support unified repositories of user 
data that could be managed locally by the user or outsourced 
to a trusted third party. The impetus for these projects are 
varied, ranging from user interest in aggregating one's own 
data in a single location to better derive benefits from their 
mixing and matching to more explicit interests in privacy 
(user control) and commerce (a market place for sharing, in- 
cluding possibilities for cash payments in exchange for data) 

The similarities between these and earlier efforts can be 
quite stark: Mydex's recent white paper, "The Case for Per- 
sonal Information Empowerment" [38], recapitulates much 
that was described in a white paper released a full decade 
earlier by Lumeria, a failed infomediary [3D]. To describe 
this as a simple case of "an idea whose time has come" 
would be to miss the important lessons that these earlier 
and recurring failures should offer those who wish to pursue 
decentralized personal data architectures. 

Decentralized social networking has been a largely parallel, 
sometimes overlapping line of development with similar mo- 
tivations. We subdivide such social networks into federated 
(ecosystem of interoperable implementations in the client- 
server model) and distributed (peer-to-peer). The term dis- 
tributed social networking is frequently but incorrectly used 
to describe all decentralized social networks. 



While some early thinking in the semantic web community 
could be classified in this categoryQ for the most part decen- 
tralized social networking appears not to have anticipated 
the success of mainstream commercial, centralized social 
networks, but rather developed as a response to it. Indeed, 
prominent members of the web community dismissed social 
networks until 2007-2008 (for example, [27] and [E]) and 
academic computer scientists appear to have considered it a 
passing fad as well — in our survey we see a sharp spike in 
interest among researchers around this time frame. 

A series of well-publicized privacy mishaps by Facebook and 
Google starting in 2009 that reached its crescendo around 
the 2010 f8 developer conference stirred up interest among 
the public and policymakers^ Perhaps the most well known 
project that resulted is Diasporfl which was funded in 
excess of $200,000 via the crowd funding platform kick- 
starter. com. As of this writing Wikipedia lists about 40 
decentralized social networks [58], most of which are feder- 
ated, whereas the academic literature has focused on dis- 
tributed social networking for natural reasons, since those 
present more research challenges. 



change. Goldman [21] envisions that software agents will 
make marketing messages perfectly relevant, eliminating ex- 
ternalities from wasted attention. By Coase's theorem [34] , 
this will lead to a socially optimal level of marketing. 

Turning to social networks, the key challenge of distributed 
social networks is hosting and message transfer. One solu- 
tion is to encrypt messages and store them in a distributed 
hash table [8] [2]. Another is "social replication": messages 
are stored in plaintext in a redundant manner by those who 
have access rights (typically friends of the message poster) 
[49] . Message passing sometimes exploits the relationship 
between the social graph and the topology of the physical 
network [25ll8]. 

Another frequent goal is keeping edges of the graph secret, 
for which various solutions have been proposed: a crypto- 
graphic approach [5], anonymous routing |14| and friend- 
to- friend networks such as Freenet in 'darknet' mode [12] . 
Persona takes the cryptographic heavy-lifting a step further 
to enable fine-grained access control using attribute-based 
encryption [BJ. 



2. REPRESENTATIVE SURVEY 

Rather than attempt an exhaustive survey, in this section 
we list the key ideas that have been explored in the course 
of developing decentralized designs. There has been a great 
fecundity of creative and complex ideas in this space span- 
ning the realms of technology, law and economics; we are 
unable to present them in detail due to space constraints. 
We refer the reader to the cited works. 

The core idea of an infomediary is that of a trusted third 
party that interfaces between the user and commercial enti- 
ties such as marketers [23]. Users' personal data can be man- 
ually given to the infomediary, as in Lumeria, or collected 
through passive monitoring, as in AUAdvantage and other 
systems [20]. That information can then be utilized without 
explicit monetization (Mydex, etc.), or users can be paid for 
their data (AUAdvantage, Bynamite [29], etc). It has var- 
iously been argued that telecommunications providers [551 
[4], banks [9] and other parties such as providers of home 
entertainment set-top boxes are ideally suited to play the 
role of the intermediary. An infomediary might also enable 
a targeted attention market [39] based on user preferences. 

Kang et al. introduce the intriguing idea of licensing inter- 
mediaries to increase their trustworthiness [28] . In the other 
direction, Vendor Relationship Management systems largely 
eliminate the infomediary as a separate entity, and instead 
replace it with a software agent [35]. Some software interme- 
diaries like Adnostic use cryptography to achieve additional 
privacy properties [54]. Other ideas for improving privacy 
include fine-grained access control lists |37] . 

Both VRM and infomediary systems often emphasize ben- 
efits to the firm from the intermediated nature of the ex- 



1 The Internet Archive lists a version of the Friend of a 
Friend (FOAF) project (www.foaf-project.org) from August 
2003, and other efforts may be older. 

2 Fo r an article typifying public opinion during that period, 
see [45] . 

d https : //joindiaspora. com/ 



Other models for hosting have been explored. In vis-a-vis, 
each user owns an EC2 virtual host that is active at all 
times [48] , whereas FreedomBojJl proposes cheap plug com- 
puters. Lam et al. have proposed email as a backend |19 l 
and ephemeral networks on smartphones [17] ■ Unhosteqj 
proposes separating data from code, but keeping both in 
the cloud. Along similar lines, FrenzjQ is a distributed so- 
cial network software with Dropbox as the backend. Polaris 
proposes reducing existing social networks such as Youtube 
and Twitter to datastores and layering a social network on 
top, with smartphones providing access control management 
interfaces [59] , 

Finally, federated social networks aim to create an ecosys- 
tem of standards-based interoperable implementations of so- 
cial networks. Some designs such as Diaspora are a hybrid 
between distributed and federated. OStatus, being coordi- 
nated by the W3C, represents an interesting approach to 
standardization for federated microblogging: it references a 
suite of existing protocols rather than developing them from 
scratch. 

3. CLASSIFICATION 

Table 1: The four types of architectures that are the 
subject of our study 





Commerce, Health etc. 


Social Networking 


Self-hosted 


PDS / VRM 


Distributed 


Outsourced 


Infomediary 


Federated 



We emphasize that the division in Table [T] is only meant to 
provide the reader with a rough mental map and is far from 
precise. The vertical axis, in particular, is closer to a spec- 
trum than a strict division. The terms Personal Data Store 
and Vendor Relationship Management do not appear to have 
a single definition. Also, some PDS projects are application- 



J http : I l± reedomboxf oundation . org/ 
'http : //unhosted. org/ 
£ http : //f renzyapp . com/ 



agnostic, but these tend to be software libraries/platforms 
rather than complete user-facing systems. 

Towards a finer-grained classification and understanding of 
different projects, we propose the following (non-independent) 
axes that are components of what it means for an architec- 
ture to be decentralized. 

1. Locus of data hosting: this could be remote (cen- 
tralized), by a trusted third party (infomediary) , dis- 
tributed (peer-to-peer), or local (i.e., on the user's de- 
vice) . 

2. Open standards vs. proprietary. 

3. Open vs. closed-source implementations. 

4. Data portability: Data export (for users), APIs (for 
third parties), or none. 

The above are technical characteristics; one might also try 
to classify systems in terms of the social values they espouse. 
We discuss four in particular. 

1. Privacy: According to Nissenbaum |41l 140] . systems 
that attempt to preserve privacy should attempt to 
preserve the integrity of the context in which actors 
engage with each other. They should do this by en- 
suring that information flows respect the norms of the 
context. To the degree that systems better model and 
mediate appropriate information flows, they will ad- 
vance the privacy interests of their users. This view 
will inform the discussion in Section f4. II 

2. Utility: We refer to the overall social benefit of the sys- 
tem, in the sense of welfare maximiation in economics. 
One way to achieve increased utility is through greater 
interoperability or data portability. 

3. Cost: Cost encompasses hosting and bandwidth costs 
as well as software development and maintenance costs. 
Centralized and decentralized systems behave very dif- 
ferently: in the former case there is typically a single 
entity that bears all the costs whereas in the decentral- 
ized setting it can be split among users and various 
software creators and service providers. Comparing 
these alternatives may therefore be tricky. 

4. Innovation: We must also consider how quickly differ- 
ent systems are able to evolve and adapt. Some have 
argued that open standards catalyze innovation while 
others point to the time and monetary costs of stan- 
dardization. The strength of the business model, the 
extent of market competition, the ability to harness 
and analyze data, and legal compliance requirements 
are some of the other factors that affect how conducive 
a system is to innovation. 

Values may not be immediately deducible from the techni- 
cal design of a system, but may instead only be observable 
empirically. Indeed, we suggest that much of the reason for 
what we see as overenthusiastic claims about decentralized 
systems is that design characteristics have been confused 
with values. We discuss two prominent cases in detail in 
Sections 14.11 and 14.21 Moreover, we doubt whether any ar- 
chitecture could optimize for all values simultaneously. 



4. DRAWBACKS OF DECENTRALIZATION 

In this section we present some underappreciated drawbacks 
of decentralized architectures. Not all of these apply to all 
types of systems, nor is any of them individually a deci- 
sive factor. But collectively they may help explain why 
decentralization faces a steep road ahead, and why even if 
adopted, decentralization will not necessarily provide all the 
benefits that its proponents believe will automatically flow 
from it. 

An architecture without a single point of data aggregation, 
management and control has several technical disadvan- 
tages. First is functionality: there are several types of com- 
putations that are hard or impossible without a unified view 
of the data. Detection of fraud and spam, search, collabo- 
rative filtering, identification of trending topics and other 
types of analytics are all examples. Decentralized systems 
also suffer from inherently higher network unreliability, re- 
sulting in a tradeoff between consistency and availability 
(formalized as the CAP theorem |57h: they may also be 
slower from the user's point of view[j The need for syn- 
chronized clocks and minimizing data duplication are other 
challenges. 

The benefits and costs of standardization are a prominent 
socio-technical factor. Many decentralized systems depend 
on multiple interoperating pieces of software, which requires 
standardization of technical protocols, design decisions, etc. 
On the one hand, such an ecosystem could promote long- 
term innovation; on the other hand, these processes (e.g., 
HTML5) move at a far slower pace than Facebook or an 
ad network which can roll out features over the timespan of 
days or weeks. Shapiro notes two benefits of standardiza- 
tion: greater realization of network effects and protection of 
buyers from stranding, and one cost: constraints on variety 
and innovation, and argues that the impact on competition 
can be either a benefit or a cost |50| . 

Let us now turn to economics. Centralized systems have 
significant economies of scale which encompasses hosting 
costs, development costs and maintenance costs (e.g., com- 
bating malware and spam)0 branding and advertising |42j . 
A related point in the context of social networks: we hy- 
pothesize that the network effect is stronger for centralized 
systems due to tighter integration. 

Path dependence is another key economic issue: even if we 
assume that centralized and decentralized architectures rep- 
resent equally viable equilibria, which one is actually reached 
might be entirely a consequence of historical accident. Most 
of the systems under our purview - unlike, say, email - were 
initially envisioned as commercial applications operating un- 
der central control, and it is unsurprising they have stayed 
that way. 

The theory of unraveling suggests that infomediaries in par- 
ticular might not in fact represent a stable equilibrium. For 
an infomediary to succeed, consumers and businesses must 

7 Google reports that users exposed to an additional delay 
of as little as 100ms perfor med a statistically significantly 
smaller number of searches |44j . 

8 Facebook has built a highly sophisticated real time "im- 
mune system" which relies in part on human operators [51] . 



transact through the intermediary rather than directly with 
each other. But either side of this market might see par- 
ticipants iteratively defecting, resulting in unraveling of the 
market. Chen et al. discuss how this might happen from 
the businesses' side [TT|, an< A Peppet discusses it from the 
consumer side [43] ■ However, it is not fully clear why many 
types of intermediaries have taken hold in many other mar- 
kets — employment agents, goods appraisers, etc — but not 
in the market for personal data. 

A variety of cognitive factors hinder adoption of decen- 
tralized systems as well. First, the fact that decentralized 
systems typically require software installation is a significant 
barrier. Second, more control over personal data almost in- 
evitably translates to more decisions, which leads to cogni- 
tive overload. Third, since users lack expertise in software 
configuration, security vulnerabilities may result. A related 
point is that users may be unable to meaningfully verify 
privacy guarantees provided through cryptography. 

Finally, we find that decentralized social networking systems 
in particular fare poorly in terms of mapping the norms of 
information flow. Access control provides a very limited con- 
ception of privacy. We provide several examples. First is the 
idea of "degrees of publicness." For example, on Facebook 
a post may be publicly visible, yet the site has defenses to 
stop crawlers which prevents the post ending up in a search 
engine cache, so that the user may meaningfully hide or 
delete the post later if they so choose. Second, in current 
social networks privacy is achieved not only through tech- 
nical defenses but also through "nudges" [36]. When there 
are multiple software implementations, users cannot rely on 
their friends' software providing these nudges. Third, dis- 
tributed social networks reveal users' content consumption 
to their peers who host the contenlQ (unless they have a 
"push" architecture where users always download accessible 
content, whether they view it or not, which is highly inef- 
ficient.) Finally, decentralized social networks make repu- 
tation management and "privacy through obscurity" (in the 
sense of [26]) harder, due to factors such as the difficulty of 
preventing public, federated data from showing up in search 
results. 

4.1 On Control over Personal Data 

We now discuss two drawbacks in detail to illustrate the 
difference between architectural decisions and social values 
that they are often implicitly assumed to promote. The first 
is the distinction between control over hosting and privacy. 
To elucidate this we present a thought experiment. 

What does it mean for users to truly host and control their 
personal data, while still being able to participate in activ- 
ities such as social networking and personalized commerce? 
Compared to using Facebook, hosting one's data on a per- 
sonal EC2 instance certainly puts the user in greater control, 
but Amazon will turn over user data in response to a sub- 
poena or court order [3]. 

For any hope of absolute control, users must, at a minimum, 
host data on their own device resident on their physical prop- 

9 Thi s is a particularly serious problem for systems like Con- 
trail [52]. 



erty. This is already considerably at odds with the reality 
of today's consumer Internet: bandwidth to the home is of- 
ten asymmetric, or connectivity is restricted in other ways 
(NATs, firewalls), and few individuals possess always-on de- 
vices capable of running web services PI 

Furthermore, the software running the services must be open- 
source, and be audited by third-party certification author- 
ities, or by "the crowd". Silent auto-updates, which is the 
model that client-side software is increasingly moving to- 
wards, would be difficult due to the auditing requirement, 
perhaps prohibitively so. 

Further still, hardware might have backdoors, and therefore 
needs an independent trust mechanism as well. The user 
also needs the time and knowhow to configure redundant 
backups, manage software security, etc. Finally almost all 
decentralized architectures face the the problem of "down- 
stream abuse" which is that the user has no technical means 
to exercise control over use and retransmission of data once 
it has been shared [47] , 

This thought experiment shows that absolute control is im- 
possible in practice. Further, it suggests that control over 
information is probably not the right conceptualization of 
privacy, if privacy is the end goal. 

4.2 Open standards and Interoperability 

Interoperability is a laudable goal; it could enhance social 
utility, as we have mentioned earlier. However, it has fre- 
quently been reduced to the notion of open standards. We 
argue here that while open standards are a prerequisite for 
interoperability, there is a big gap between the two. In par- 
ticular, the efforts at federated social networking all follow 
open standards, but their actual interoperability status in 
practice appears to be poor [56]. Let us examine why this 
is the case. 

One major impediment is that there are too many standards 
to choose from. For the most basic, foundational component 
— identity — there are many choices: OpenID, WebID and 
others. While it is possible to connect these to each other, 
it requires extra effort. As we get to more complex (but 
still basic) functionality such as federation of messages, we 
find on the one hand Atom/PubSubHubbub etc. and the 
OStatus suit(J3 on top of it, and on the other hand XMPP 
and the Wave federation protocoQ on top of it. It appears 
that the former is gradually winning out, but this is a slow 
process. 

The second major impediment is that as soon as we get past 
the basics like identity, friendship and status updates, there 
is an incredible array of parameters to nail down. Take 
the apparently trivial issue of what formatting is allowed 
in a status update. Unless two systems agree on the same 
standard, they are not interoperable because users of one 
will see malformatted messages originating from the other. 
Needless to say, centralized platforms have a large and ever- 

10 It remains to be seen if smartphones will become practical 
for this use-case. 
11 http: //ostatus . org/ 
12 http: //www . waveprotocol . org/ 



increasing set of features — photos, video chat, polls, to 
name a few — all of which would require standardization in 
the federated context. Finally, access control in a federated 
setting presents hard technical challenges. 

The practical upshot is that the only suite of standards that 
shows any signs of meaningful interoperability is Status- 
NelF^I — microblogging is both text based, largely elimi- 
nating the formatting issue, and typically public, sidestep- 
ping the access-control issue — although identi.ca remains 
the only implementation with meaningful adoption. Even 
though this system limits status updates to text, a version 
of the formatting problem still plagues it: identi.ca restricts 
updates to 140 characters in an attempt to maintain some 
interoperability with Twitter! 

We conclude that while federated social networks have the 
potential to converge on a reasonably interoperable collec- 
tion of software — subject to the caveats of differing feature 
sets and parameters — it is not simply a matter of making 
some technical decisions, but instead needs serious developer 
commitment as well as the involvement of standards bodies 
with significant authority. 

5. RECOMMENDATIONS 

Based on our analysis above, we offer the following recom- 
mendations for developers of decentralized systems. 

1. Consider the economic feasibility of your design. In 
particular, are there entities with the economic incen- 
tive to play the various roles that are called for? This 
has perhaps been the most common reason for the lack 
of adoption of past proposals and projects. 

2. Pay heed to conceptual fidelity. Are you shooting at 
the right target? Do people have the values you think 
they do? Do they really want the features/benefits you 
claim they want? As one example, there have been 
a multiple of projects that attempt encrypted com- 
munication over Facebook and other social networks 
(NOYB Fly By Night [32], Lockr [S3], FaceCloak 
[33] . Scramble! [7], etc.), but the lack of adoption sug- 
gests that the usability costs do not outweigh the ben- 
efits to users. 

3. Incorporate other notions of regulabilitv [oTll31] . Many 
decentralized systems represent an extreme choice: they 
seek to achieve privacy and other properties purely 
through technology, ignoring socio-legal approaches. 
This extreme may not be optimal. 

4. Offer advantages other than privacy to users. Privacy 
is always a secondary feature — while it might tip 
the balance between two competing products, users 
rarely pick a product based on privacy alone. For ex- 
ample, distributed social networking can enable some 
location-specific functionalities through peer-to-peer net- 
working even when there is no Internet access. 

5. Design with standardization in mind. One of the dis- 
advantages we have identified is the proliferation of 
non-interoperable systems. Open standards are not 



enough: developers must actively prioritize interoper- 
ability and write and maintain glue code to interface 
with other systems. 

6. Target limited feature sets. A system like Facebook 
is a large, complex moving target. Attempting to cre- 
ate a decentralized version of it is a futile endeavor. 
Instead, systems that embody the 'minimum viable 
product' strategy might succeed better in the market. 
Decentralized microblogging appears to be a relatively 
attainable goal at the present time, and censorship re- 
sistance is a goal for which there is much demand. 

7. Work with regulators. As numerous law/economics 
scholars have pointed out, market solutions appear to 
underprovide privacy and regulation can help tweak 
the environment to address this imbalance [46] . Those 
who wish to see the personal data ecosystem flour- 
ish would do well to support regulatory interventions 
such as transparency and opt-out that can help level 
the playing field between centralized and decentralized 
systems. 

6. CONCLUSION 

In this position paper we have taken a look back at the efforts 
to build decentralized personal data architectures motivated 
either by discontent with the status quo, or as a better way 
to organize information markets and leverage new commer- 
cial opportunities, or a combination of both. We hope we 
have provided some mental clarity to the reader on the simi- 
larities, differences and common themes between the various 
systems and brought fresh perspective to the question of why 
they have largely floundered. 

We hope to kick off a more tempered discussion of the future 
of personal data architectures in both scholarly and hob- 
byist/entrepreneurial circles, one that is informed by the 
lessons of history. There is much work to be done along 
these lines — application of economic theory can shed light 
on questions such as the relative strength of network effects 
in centralized vs. decentralized systems. Empirical method- 
ology such as user and developer interviews would also be 
tremendously valuable. While we have provided some sug- 
gestions for developers, in the future we hope to identify 
specific application domains that are relatively amenable to 
the adoption of decentralized architectures, as well as to pro- 
vide concrete recommendations for policymakers who might 
wish to foster a different market equilibrium. 
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